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Abstract 

We report our development of a simple but fast and efficient inductive unsupervised semantic tagger for 
Chinese words. A POS hand-tagged corpus of 348,000 words is used. The corpus is being tagged in two 
steps. First, possible semantic tags are selected from a semantic dictionary(Tong Yi Ci Ci Lin), the POS 
and the conditional probability of semantic from POS, i.e., P(S\P). The final semantic tag is then 
assigned by considering the semantic tags before and after the current word and the semantic-word 
conditional probability P(S\ W) derived from the first step. Semantic bigram probabilities P(S\S) are used 
in the second step. Final manual checking shows that this simple but efficient algorithm has a hit rate of 
91%. The tagger tags 142 words per second, using a 120 MHz Pentium running FOXPRO. It runs about 
2.3 times faster than a Viterbi tagger. 

1. Introduction 

Word Sense Disambiguation or WSD has been an important research area in NLP for many years 
(Black 1988, Bruce 1994 and 1995, Harder 1993, Lam 1995, Leacock 1993, Luk, 1995, Ng 1995, Ng 
1996, McRoy 1992, Miller 1994, Yarowsky 1995, Zernik 1990). The reported accuracy of 
disambiguation varies from 72% (Black 1988) to 90%(Ng 1996). In the Chinese language, unfortunately, 
there have been much fewer reports on WSD. 

Lam(1995) applied a linguistic-based word sense disambiguation algorithm for Chinese(LSD-C). 
This system does not require training. It relies on two dictionaries: Xiandai Hanyu Cidian and TongYiCi 
Lilin(Mei, 1983 and 1992) and achieves hit rates of from 36% to 57.6% (with the average of 45.60%). 
The hit rates seem a bit on the low side. However, the statistics covers only ambiguous words. The test 
sample has average 3.4 senses per word. 

Traditionally, part-of-speech plays a major role in the analysis of sentences. In Western 
languages, the functional role a word plays in a sentence is almost entirely determined by its part-of- 
speech, or syntactical category. In Chinese, on the contrary, it is almost impossible to establish a one-to- 
one association between the part-of-speech of a word and its functional role(Wan 1989, Wu 1982, Zhang, 
1986). For example, in Chinese, a noun can be a subject, predicate, object, and attributive(See Tablel, 
adapted from Zhang 1986, page 155). A noun is only not allowed to play the role of a complement. 



Tabic 1 : Functional Roles of Chinese Part-of-Speech 
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symbols: O frequently allowed, V : allowed, ? conditionally allowed, allowed only in a few cases, X: not 
allowed. 



This creates a problem for the Chinese sentence analysis using part-of-speech. In a standard text 
book of Chinese language, sentence structures are analysed according to the roles of the words in the 
sentences. It is therefore important to find out if the semantic class(or sense) of a Chinese word plays a 
key role in analyzing Chinese sentences. We therefore have to have a Chinese text that is semantically 
tagged. 



With the absence of a semantically tagged corpus, we have to use an unsupervised approach. To 
make this possible, we adopt two important strategies: 1. Induction and 2. Divide-and -conquer. 

Using the first strategy, we hand-tagged a small section of the corpus of about 17,000 words. 
From this corpus we calculate P(SIP), P(SIW) and P(SIS), which are the conditional probabilities of 
semantic(S) from part-of-speech(P), semantic(S) from word(W) and semantic bi-grams. We use these 
parameters to guide us in the subsequently tagging of larger and larger corpora. At the end, a corpus of 
348,000 words are tagged. 

For the second strategy, we divide the tagging into two phases. First, we identify the possible tags 
and second, we compute for the most likely tag from a list of possible tags. We make our preliminary 
selection of possible semantic tags by referring to a semantic classification dictionary, i.e., Tong Yi Ci Ci 
Lin (CILIN, Mei, 1983, 1992) and the conditional probabilities of semantic from part-of-speech, i.e., 
P(SIP). Final selection of the most likely tags bases on P(SIW) and P(SIS) probabilities. 

In this way, we develop a fast and efficient algorithm to semantically tag a Chinese corpus of 
348,000 words to an accuracy of about 90%. The tagging algorithm also runs 2.3 times faster that the 
Viterbi algorithm, one of the fastest tagging algorithm available. 

After this introduction, in Section 2, we provide a brief description on our corpus, the part-of- 
speech tag set and the semantic classification of CILIN. In Section 3, we explain in details our tagging 
algorithms. In Section 4, we report the steps of tagging. In Section 5, we present a simple error analysis 
and compare the speed of our tagging algorithm with some those using other approaches, such as genetic 
algorithm and Viterbi algorithm. Our final conclusion appears in Section 6. 

2. Our corpus and CILIN 's Semantic Classes 

We obtained a POS-tagged corpus from Tsing Hua University. This corpus is manually tagged 
with a tag set of 1 13 part-of-speech (Bai 1992, 1995, also see Table 2) 

We extract a section of text of about 17,000 words from the corpus and manually tag it with 
semantic classes according to the Tong Yi Ci Ci Lin(Mei, 1983, 1992). CILIN' s semantic classification is 
a three layers hierarchical tree. There are 12 major, 95 middle and 1428. minor classes(Lua 1993a and 
1993b). We select the middle class(95 classes) and add in the following additional classes : 

Ma numbers 

Nd name of place 

Nr name of person, including surnames 
Pd punctuation marks 
Ud Others 

So, we end up having totally 100 semantic classes. 

We select the middle classes as it matches well with the 113 POS tags of the Tsing Hua system. 
During hand tagging, we select the most appropriate class and assign it to the word according to its 
meaning and POS in the sentence. In some cases, we have to manually provide a tag for the word. These 
are: (1) when we have decided that none of the classes in CILIN is appropriate and (2) when the word is 
absent from CILIN. In case (2), we refer to a word with similar or closest meaning to the one in CILIN. 
For example, we refer to ^ for semantic classes of m§. Like any other hand tagged corpora, we cannot 
ensure that our tagging is 100% correct. However, as we can later, our system has a very high tolerance to 
errors. There is actually no need to start with a 100% correctly tagged text. 

From this hand tagged text, we derive our first set of conditional probabilities. These are : P(SIP), 
P(SIW) and P(SIS). P(SIP) is the most useful as we rely on it to select the preliminary set of semantic tags 
for the word. 

3 Development of tagging algorithm with a small hand-tagged corpus(A) 

We develop our tagging algorithm with the above-mentioned hand-tagged corpus(A). Our 
approach is inductive because we use parameters derived from a small section of the corpus to tag a larger 
and larger section of the corpus. We repeat the process until the whole corpus is tagged. 



Table 2: Tsing Hua POS Tag Set 
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3. 1 Selection of Possible Semantic Tags - the PICK program 

We select the 7 most likely tags using CILIN and P(SIW). We set up a score system as below: 
For every word, we assign a score of 1 to the semantic classes that appear in CILIN. We add this number 
to P(SIP) according to the POS assigned to the word. We then select from all the 100 semantic classes, the 
7 classes with the highest scores. We place the tag with the highest score in the first cell and name it 
SEMI; the tag with the second highest score in SEM2 and so on. We found that 73.47% of the words 
have their semantic tags assigned by a combined score from CILIN and P(SIP). For the remaining 26.53% 
words, the assignments are determined by P(SIP) alone. Note that < 0< P(SIP) < 1 and the total score is 
<2. 



Table 3: Semantic Classes of CILIN 
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3.1.1 Hand-tagged Corpus A - 17,000 words 

To determine the accuracy of the preliminary selection, we run PICK with the hand tagged 
corpus A. The hit rates vary from 78.69% to 99.87%, depends on the number of classes include(see Table 
4) 

Table 4: Hit Rate of Preliminary Assignment 
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As the speed of tagging depends on the number of tags included in this preliminary selection(See Table 
5), we decide to limit the number of tags for the final selection to 3. This sets an upper limit for our 
tagging accuracy to 98.46%, a number that we consider to be much higher than what we would achieve 
from the current tagging algorithm. 

3.2 . Tagging Algorithm - TAG program 

In the second step, we compute for the most likely tag from a pool of tags selected from PICK. 
In this program, we consider only the tags before and after the current one. We calcite 18 scores. These 



are the conditional probabilities of the semantic bigrams(P(SIS)) between the current selection and its 6 
neighbors(See Figure 1) weighted by P(SIW) for the word W under consideration 



Word(i-l) SemTagl SemTag2 SemTag3 




Word® SemTagl SemTag2 SemTag3 P(SIW) «■ 




Word(i+l) SemTagl SemTag2 SemTag3 

Figure 1 : Computation of Score 

score = P(S i IW i )[XP(S i (WOIS j (W i _ 1 ))+ ^.(WOIS/W^))] 
where i, j =1,2,3 

In Figure 1, Word(i) is the current word. Word;.] and Word i+1 are words before and after the 
current word. The weight between two semantic tags equal to the bigram conditional probabilities between 
them. These are: P(SIS), or P(S,l,_i) and P(Si_ili). 

Processing corpus (A) with this algorithm and with the preliminary data set of P(SIW), P(SIS) 
derived from the hand-tagged corpus, we obtain a hit rate of 86.4%. We consider this high enough for us 
to begin our inductive process. 

3.3 Repeated Tagging 

Before going to the larger corpus, we want to know if we can repeatedly tag the corpus to obtain 
better hit rates. We update data (P(PIW), P(SIW) and P(SIS)) with the computer-tagged corpus A and re- 
tag the corpus. The hit rate increases slightly to 87.4 %. The improvement is thus rather limited. 

3.4 Probabilities of Items Which Are Absent (Sparse Data Set Problem) 

One important decision to make during tagging is the choice of probabilities of occurrence of 
item which does not occur (sparse data set problem). Although our algorithms allows probability, we 
believe that it will be more better if we select a number that is slight less than half, i.e. 0.4. We set the 
probability of non-occurrence items to 0.4/(corpus size). 

We experiment our tagging programs with many different values. These are : 0.1, 0.2, 0.3, 0.4 
and 1.0. We find that the smaller the value, the more the system selects a tag based on the current 
parameters. This is undesirable as we want our algorithm to play a major role in the search of correct tags. 
We also do not want to leave the selection of tag entirely to the preliminary data set. Table 5 shows the hit 
rates for different values of occurrence of non-occurrence items. 



Table 5: Hit Rates and Occurrence of Non-occurrence Items 
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4 . Tagging 



With the parameters extracted from the small hand-tagged corpus, we proceed to do preliminary 
selection of the semantic tags for a larger corpus. We divide this larger corpus into 2 parts, corpus B and 
corpus C of about equal size, each having about 170,000 words. In this way, we can obtain a better picture 
on how the tagger performs. Corpus B was processed first. 

4.1 Tagging Corpora B - 179,159 words 

We run PICK on Corpus B. Tags of 73.47% of the words are selected by CILIN and P(SIP), 
while the remaining 26.53% are selected by P(SIP) alone. However, before exciting TAG program, we 
have to solve a problem. Corpus B has not been tagged before and therefore the parameters P(SIW) are 
lacking. We do not wish to use P(SIW) derived from Corpus A as Corpus B is 10 times larger and it 
contains far more number of unique words. Corpus A has about 2300 unique words while Corpus B has 
more than 10,000 unique words. 

We decide do a little ' repair' to CILIN. We update the semantic classes in CILIN by referencing 
to P(SIW) data obtained from Corpus A. We find that 94.90% of the tags are now selected by the updated 
CILIN and P(SIP) and only 5.10% are selected by P(SIP) alone. 

Next, we use the tags in SEMI as the preliminary semantic classes for computing P(SIW) and 
P(SIS). We know that SEMI contains only 78.68% of the correct tags(See Table 4). 

In the next round, we run TAG to select the most likely tags. Then we update parameters P(SIP), 
P(SIS) and P(SIW) and re-tag the corpus with TAG. We re-tag the Corpus only once as we find that the 
two tag sets differ by only 2% (98% in agreement). 

4.2 Tagging of Corpus C - 167,234 words 

For corpus C, the PICK program selects 71.56% of tags by referring to CILIN (original CILIN, 
not the updated one) and P(SIP) and the remaining 28.44% from P(SIP) alone. We again update CILIN 
with P(SIW) generated from Corpus B(no more from Corpus A). The corresponding rates now change to 
92.92% and 7.09%. Subsequent tagging process are identical to what we have described in Section 4.1. 

4.3 Tagging of Whole Corpus of 348,393 words 

At the final stage, Corpus B and C are combined into a single corpus and the whole corpus is 
tagged. This is called Corpus Z. 

5. Results 

2000 semantic tags from corpus Z are selected and checked manually. A total of 197 errors are 
discovered. This gives our tagging a hit rate of 90.1%. It is difficult to present a detailed analysis on the 
error pattern based on this small error sample. However, we can still classify them into the following 
types: 

1 . Errors caused by wrong or inadequate CILIN classification 

2. Error caused by wrong POS tagging 

3. Errors caused by the tagger 

5.1 Errors 

CILIN has many types of errors. These are: (1) Errors due to the wrong classification by the 
authors. The 70,000 words are classified and collated manually by the 4 authors in a span of about 10 
years. They are many occasions of in-consistency and wrong entries to the dictionary. This is event more 
serious as we obtain the entries not from the main text, but from the indices where the authors were paying 
much lesser attention to in their checking and verification. 

The second problem is our own entries to the CILIN data base from the dictionary to computer. 
We have discovered an error of about 1-2% in the data entry. We have not corrected these erroneous 



entries because of the huge number of words. The last and more serious problem is the inadequacy of the 
word entries in the dictionary. Many words commonly used today are not included in the dictionary. 

11% of the total errors are identified to be caused by the wrong semantic entries to CILIN. The 
incompleteness of CILIN produces another 8 errors (4%) . These are ^7^.(5 errors), ( 1 error), 

error), HJ#J#l( 1 error). 

The POS tagging of the corpus is also not 100% correct. For example, all idioms are classified as 
' i' . This produces 2 errors. We had also contributed 6 errors by not considering class ' s' as names of 
places, for example, _h ^(Shanghai). 

We may conclude that about 19 errors come from the CILIN and 8 errors from POS and our 
preliminary semantic tagging. We would have our hit rate improved by 0.95% if these errors are 
eliminated. 

5.2 Number of Semantic Tags per Word 

The number of semantic tags associated with each word is an important factor to look at. From 
table 6, we find that the average number of semantic tags per word is 1.11. Compared this number to the 
1.20 semantic classes per word obtained by direct counting from CILIN(Lua 1993a, 1993b), we find that 
the Chinese words are less ambiguous when they appear in text. 



Table 6 Semantic Classes per Word 
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5.3 Distribution of Semantic Classes 

We can also compare the dynamic and static distribution of the semantic classes. (See Table 7). 
It is quite interesting to observe that the two distributions agree to each other very well(Lua 1993a and 
1993b). The only exception are classes B and K. For class B, there is a much larger collection in CILIN 
than it is actually used. Also, for class K, although the number of words in CILIN is quite small, their 
actual usage is very high. K words are functional words that are grammar markers in a sentence. 



Table 7: Distribution of Semantic Classes 
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5.4 Number of semantic classes per POS class 

The number of semantic classes associated with one POS class is an important indicator for us to 
evaluate the usefulness of the semantic tagging. The semantic tags will be redundant if there is a one-to- 
one association between the two tag sets. The statistics is given in table 8 



Table 8: No of semantic tags associated to each POS tag 
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It can be seen from the table that most POSs are associated with 2-5 semantic tags. It is therefore quite 
clear that semantic tags provide information in addition to those provided by POS. 



5.4. Comparison with Genetic and Viterbi Algorithms 

We consider 2 other alternative ways of tagging before we work on the current approach. These 
are :(1) Genetic Algorithm(GA Tagger), (2) Viterbi Algorithm. 

5.4.1 GA Tagger 

In the GA tagger that we developed to tag the corpus, we use P(PIP), P(PIW), P(SIP), P(SIW), 
P(SIS) to compute the fitness function. This allows us to tag both POS and semantic classes 
simultaneously with and without the preliminary tag selection described in Section 3. For Corpus A, we 
have 100% hit rates. This is because the GA, with all its parameters, memorize the complete tag set. This 
type of tagging is meaningless. 

We next attempt the outside test. This is done by removing a sentence from the corpus and re- 
compute all the probabilities. We then use the new set of parameter to tag the sentence. Experimenting 
this approach with 20 sentences selected for outside test, the average hit rates are : 80.2% for semantic and 



79.7% for POS tagging. The average tagging speed is 48 s per word, using a 120 MHz Pentium PC 
running Visual FOXPRO Ver 3.0. 

To reduce the long tagging time, we pre-selection 3 tags using PICK program and then tag the 
sentences with the GA tagger. The processing speed increases to 40 ms per word. We eventually 
abandoned this algorithm as the GA approach is the slowest amongst all the three. 

5.4.2 Viterbi Tagger 

Viterbi first appears to be a very good alternative to GA for its high speed. It gives the most 
optimal solution. However, to our surprise, it is slower than the simple approach we developed for this 
project. The Viterbi spends 16 ms to tag a word while our simple algorithm spends less that half of this 
ammoniate, i.e., 7 ms per word. A comparison chart appears in Table 9. Note that GA and Viterbi select 
the best tag from a pool of 100 (POS or semantic) or 200 (POS and semantic simultaneously) tags 
whereas our algorithm selects one from a set of 3 tags 



Table 9: Speed of Tagging 



Tag Set 
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Genetic Algorithm 
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64 s [ 


Viterbi 


16 ms 


1.5 s 
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We finally abandoned both GA and Viterbi because of their longer processing time. Their possible higher 
hit rates is not considered an advantage as they required a tagged corpus to act as training examples. With 
the absence of such a corpus, the higher hit rates cannot be materialized. 

6. Conclusion 

Starting from a hand-POS -tagged corpus, we developed a simple inductive unsupervised process 
assigning semantic tags to a corpus of 348,393 words. The overall hit rate is estimated to be 90.1%. We 
further analysis and found that 0.95% of errors are caused by the semantic dictionary, POS tagging and 
the preliminary semantic tagging, the actual performance of our tagger is 91.05%. 

We compare our tagger with 2 other types of taggers in processing speed. The current tagger tags 
2.3 times faster than the Viterbi tagger, one of the most efficient tagger. With such as high hit rate, we 
consider our tagging algorithms fast and efficient. Many useful parameters are derived from this project. 
These are P(SIP), P(SIW), P(PIW), P(PIP) and P(SIS). These parameters can be used as parameters to tag 
other corpora. 

One problem of the current research is the narrow scope of our corpus that it contains only news 
items. We suspect that parameters derived from this corpus may not be generally applicable to the 
tagging of other type of text. For example, text on literature works can be significantly differ from the text 
of news. 

In our next project, we will attempt to tag CKIP, a corpus built by ROCLING. This corpus has 
been POS tagged by hand and it contains a blanched mix of different types of text. It is a far more better 
corpus from which more reliable parameters about POS and semantic classes can be derived. 

Reference 

Bai 1992, Bai, S.H., Xia, Y and Huang C. N., Automatic Part-of-Speech Tagging System of Chinese, 
Technical Report, Tsing Hua University, Beijing, 1992. 

Bai, Shuanhu, 1995, An Integrated Model of Chinese Word Segmentation and Part-of Speech Tagging, In 
Advanced and Applications on Computational Linguistics, Page 56-61, Third National Computational 
Linguistics Meeting, 5-7 Nov, 1995, Shanghai. 

Black, Ezra, 1988, An experiment in Computational discrimination of English Word Senses, IBM 
Journal of Research and Development, 32(2): 185-194. 



Brill, Eric, 1992, A simple rule-based part-of-speech tagger, Proceeding of the 3 r Conference on Applied 
Natural Language Processing(ACL), pp 152-155. 

Bruce R., 1995, A Statistical Method for Word Sense Disambiguation, Ph.D. Thesis, New Mexico State 
University. 

Bruce, Rebecca and Janyce Wiebe, 1994, Word Sense disambiguation Using Decomposable Models, In 
Proceedings of 32 nd Annual Meeting of Association for Computation Linguistics, Las Cruces, New 
Mexico. 

CKIP - Chinese Knowledge Information Processing Group, Technical Report 95-02, Institute of 
Information Science, Academia Sinica (Taiwan). 

Fogel, Davide, B, 1995, Evolutionary Computation, IEEE Press. 

Harder L. B., 1993, Sense Disambiguation Using On-Line Dictionaries, Natural Language Processing: 
The PLNLP Approach, Kluwer Academic Publishers, 247-261. 

Kupiec, Robust, 1992, part-of-speech tagging using a Hidden Markov Model Computer Speech and 
Language, Vol 6, No 3, pp225-242. 

Lam Sze-Sing, Vincent Y. Lum, Kam-Fai Wong, 1995, Determination of Word Sense In Chinese Full 
Text Using A Standard Dictionary and Thesaurus. Proceedings of the 1995 International Conference on 
Computer Processing of Oriental Languages, Honolulu, Hawaii, Nov 23-25, 1995, Page 247-250. 

Lin, M. Y. and W. H. Tsai, 1987, Removing the ambiguity of phonetic Chinese input by the relaxation 
technique, Computer Processing of Chinese and Oriental Languages, Vol 3, No 1, May, Ppl-24. 

Leacock, Claudia, Geoffrey Towell and Ellen Voorhees, 1993, Corpus-based statistical Sense Resolution, 
In Proceedings of the ARPA Human Language Technology Workshop. 

Lin, Y.C., T. H. Chiang and K.Y. Su, 1992, Discrimination oriented probabilistic tagging, Proceeding of 
ROCLING V, Pp87-96. 

Liu, Shing-Huan, Ken-jiann Chen, Li-ping Chang and Yeh-Hao Chin, 1995, Automatic Part-of-speech 
tagging for Chinese corpora, Computer Processing of Chinese and Oriental Languages, Vol 9, No 1, 
pp31-47. 

Lua, K. T. 1993a, A Study of Chinese Word Semantics, Computer Processing of Chinese and Oriental 
Languages, Vol 7, No. 1. P37-60. 

Lua, K. T. 1993b, A Study of Chinese Word Semantics and Its Prediction, Computer Processing of 
Chinese and Oriental Languages, Vol 7, No. 2, PI 67- 180. 

Luk, Alpha, K. 1995, Statistical Sense Disambiguation With Relatively Small Corpora Using Dictionary 
Definition, In Proceedings of the 33 ld Annual General Meeting of Association for Computational 
Linguistics, Cambridge, Massachusetts. 

McRoy, Susan W., 1992, Using Multiple Knowledge Sources For Word Sense Discrimination, 
Computational Linguistics, 18(1): 1-30. 

Mei 1983, Mei Jiaju, Zhu Yiming, Gao Yunqi and Yin Hongxiang, Tong Yi Ci Ci Lin, Shanghai 
Dictionaries Publisher, 1983. 

mmm, &-m, nmn , mnm, mxnnm , ±mm^ taw±, 1983. 

Mei (1992), Mei, Jiaju and Gao, Yunqi, A Study of the formalization of semantics, Communications of 
COLIPS, Vol 2, No 1, Page 40-27, 1992. 



Miller, George A., Martin Chodorow, Shari Landes, Claudia Leacock and Robert G. Thomas, 1994, Using 
A Semantic Concordance For Sense Identification, In Proceeding of the ARPA Human Language 
Technology Workshop. 

Ng, Hwee Tou, 1995, Word Sense Disambiguation Via Exemplar-Based Classification: A Case Study, 
Proceedings of NUS Inter-Faculty Seminar on Natural Language Processing, 8-9 Sept. 1995, 1-9. 

Ng, Hwee Tou and Hian Beng Lee, 1996, Integrating Multiple Knowledge Sources to disambiguation 
Word Sense: An Exemplar-Based Approach, to appear in ACL-96. 

Pong T. Y. and J. S Chang, 1993, A study of word segmentation and tagging for Chinese, Proceedings of 
ROCLING VI, Ppl73-193 (In Chinese) 

Yarowsky, David, 1995, Unsupervised Word Sense Disambiguation Rivaling Supervised Methods, In 
Proceedings of the 33 rd Annual Meeting of the Associations for Computational Linguistics, Cambridge, 
Massachusetts. 

Yu 1995, Yu Shiwen, Tagged Singapore Chinese School Texts, Paper R95001, CommCOLIPS, Vol 5 
Page 81-86. 

Wan, Huzhou, 1989, Comparison of Chinese and English Lexicon, published by Chinese Foreign 
Economics and Trading Publisher. JSMM, «t5t£$^ft fcfct£» , ^BXt^HSSIrKS tBJRtt, 1989. 

Wu, Min Jie, 1982, A Handbook of Chinese and English Grammar, published by Zhishi Chubanshi, 
Zhejiang, China. (In Chinese) 

^mm, «um%&^ffi» , tovuamt, i982„ 

Zernik Uri, 1990, Tagging Word Senses in Corpus: The Needle In The Haystach Revisited, Technical 
Report 90CRD198, GE R&D Center. 

Zhang, Song Lin, 1986, Tabulated Grammar For Contemporary Chinese Language, published by Sichua 
Kexue Jishu Chubanshe (In Chinese). «JlfWi£i£ft*fP» , ISJl|*H*8#ib JKtt. 1986 



